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Abstract 

Recently, we adapted random walk arguments based on work of Nach- 
mias and Peres, Martin-L6f, Karp and Aldous to give a simple proof of the 
asymptotic normality of the size of the giant component in the random 
graph G(n,p) above the phase transition. Here we show that the same 
method applies to the analogous model of random fc-uniform hypergraphs, 
establishing asymptotic normality throughout the (sparse) supercritical 
regime. Previously, asymptotic normality was known only towards the 
two ends of this regime. 

1 Introduction and results 

Let Hk(n,p) denote the random k- uniform hypergraph with vertex set [n] = 
{1,2, ... ,n} in which each of the possible edges is present independently 
with probability p. Thus H2(n,p) is the classical random graph G(n,p). Our 
aim here is to study the component structure of Hk{n,p), in particular the 
distribution of the size of the largest component above the 'phase transition'. 

Before turning to the details, let us note that the notion of a 'component' 
in a fc-uniform hypergraph can be interpreted in a number of ways. For 
any 1 ^ r ^ k — 1, one could consider two edges to be 'connected' if they 
share at least r vertices (or perhaps exactly r vertices), and use this notion to 
define the components of Hk- Moreover, the size of a component could then 
be measured in a number of ways - either by the number of vertices that it 
contains, or (probably more naturally) by the number of r-sets of vertices, or 
the number of edges. In the rest of the paper we consider r = 1. It seems that 
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other values of r have received very little attention, although the corresponding 
notions of 'cycle' (with r — 1 being 'loose' and r = k — 1 being 'tight') have 
been studied extensively. Note that for hypergraphs derived from fc-cliques 
in random graphs, questions about components defined using r = k — 1 were 
raised by Derenyi, Palla and Vicsek [7J; such components were studied for all 
l^r^fc — lin|3|. We shall say nothing further about the case r ^ 2, except to 
note that (greatly simplified versions of) the branching process arguments in [3 
presumably show that the threshold for the emergence of a giant component is 
atp~n'- fc (fc-r)!/((j!)-l). 

For the rest of the paper we take r = 1, i.e., we say that two vertices are 
connected in a fc-uniform hypergraph H if they are connected in the graph 
obtained by replacing each edge by a copy of Kk, and take the components of 
H to be the maximal sub-hypergraphs in which all vertices are connected. For 
reasons that will become clear below we write p = p(n) as X(k~ 2)ln~ k+1 , where 
A = X(n); when k — 2 this reduces to p = A/n. Our main aim is to study the 
number L\ of vertices in the largest component of Hk{n,p) in the supercritical 
regime, i.e., when (A — I)™ 1 / 3 — > oo. We also prove a result for the critical 
regime, where (A — l)n 1 / 3 is bounded. 

For k = 2, very detailed results of this type are known. Pittel and Wormald [T3"] 
and Luczak and Luczak [10] showed, in each case as part of a much stronger 
and/or more general result, that throughout the supercritical regime, L\ is 
asymptotically normal: centralized and scaled appropriately, it converges in 
distribution to a standard normal distribution. The special case where A > 1 
is constant was proved earlier by Stepanov [14]. For hypergraphs, where k ^ 3 
is fixed, much less is known: Karohski and Luczak [8] proved strong results (a 
local limit theorem) in the barely supercritical phase, when (A — l) 3 n tends to 
infinity but more slowly than log n/ log log n. At the other end of the range, 
Behrisch, Coja-Oghlan and Kang [2] proved a local limit theorem when A > 1 
is fixed. Here we shall prove asymptotic normality throughout the supercritical 
regime, for all k ^ 3 fixed. Note that our main result, while less precise than 
those of [SI [2] , has a much greater range of applicability. The proof is a (to us 
surprisingly) simple adaptation of the argument we gave for the case k = 2 in [3] , 
itself based on exploration and martingale arguments using ideas of Nachmias 
and Peres [12], Martin-L6f [IT], Karp [9] and Aldous [JJ. 

Given A > 1, let A* be the 'dual branching process parameter', defined by 
A* < f and 

A*e~ A * = Ae~ A . 

Writing X M for the Galton- Watson branching process in which the offspring 
distribution is Poisson with mean fi, it is well known that conditioning X\ on 
extinction gives X\ t . Let p\ — p 2 ,\ denote the survival probability of X\, so 
p\ > may be defined by 

l-px = e- x »\ (1) 
and satisfies A* = \p\. Finally, for k 5* 3 define pk,\ by 

l-^, A = (l- PA ) 1 /( fe - 1 ); (2) 
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it is easy to see that pk,x is the survival probability of a certain branching process 
naturally associated to Hk(n,p), where p = A(fc — 2)!n~ fc+1 . 

As usual, we say that a sequence (E n ) of events holds with high probability 
or whp if P(E n ) —> as n —> oo. If (X n ) is a sequence of random variables 
and f(n) is a deterministic function, then X n = o p (/(n)) means that X n /f(n) 
converges to in probability, i.e., that for any constant e > 0, \X n \ ^ sf(n) 
holds whp. Later, we shall also use X n — O p (f(n)) to mean that X n /f(n) is 
bounded in probability, i.e., for any e > there is a C such that for all (large 
enough) n we have P(|X n | ^ Cf{n)) ^ e. 

Writing L±(H) for the maximum number of vertices in any component of 
a hypergraph H, Coja-Oghlan, Moore and Sanwalani [5] showed that if k ^ 3 
and A > 1 are fixed and p — p{n) = \(k — 2)!ri~ fc+1 , then L 1 (Hk(n,p)) — 
Pk,\ n + o p (n). (This result was certainly known as 'folklore' before this.) Our 
main result concerns the limiting distribution of the o p (n) term, and applies 
throughout the supercritical regime. 

Given k ^ 2 and A > 1, let 

2 \{i - p) 2 - K{i - P ) + P {i - p) r _. 

= (3) 

where p = pfe,A- It is well known that when A = 1 + e and e — > 0, then pa ~ 2e. 
From (O it follows that 

2e 

p fc ,A - -j—i, 

and thus cr^. A ~ 2s~ 1 n. Expanding A* and thus p\ and hence pk,x further as 
series in e = 1 — A, it is easy to check that in fact 



'k,\ 



Thus, although the leading term does not depend on fc, the next term does. 
Theorem 1. Let k ^ 3 be fixed, and let 

p = p(n) = \(k-2)ln- k+ \ 
where A = A(n) is bounded and (A — l) 3 n — > oo. Then 
Li(H k (n,p)) - p k:X n d 



07c, A 



AT(0,1), 



where A denotes convergence in distribution, N(0, 1) is a standard normal ran- 
dom variable, and p k ,x and o~k,\ are defined in (J2J and ([3]). 

As a by-product of our proof, we obtain an analogue of the result of Aldous [T] 
giving the limiting distribution of the rescaled large component sizes inside the 
scaling window of the phase transition. Define a stochastic process W a (s) (a 
random function on [0,oo)) by 

W a {s) = W{s) + as~s 2 /2, 
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where W(s) is a standard Brownian motion. As in 1 , define an excursion 
of this process to be a maximal interval on which W a exceeds its previous 
minimum value, and let (|7i|)i^i denote the lengths of the excursions sorted 
into decreasing order. (Aldous shows that this makes sense with probability 1 .) 

Theorem 2. Let k ^ 3 be fixed, and let p = p(n) = \(k — 2)!n~ fc+1 where 
A = X(n) satisfies 

(X-lfn^ (k-l) 2 a 3 

for some a € R. Then, for any fixed r, writing L r for the number of vertices 
in the rth largest component of H(n,p), the sequence ((fc — f ) 1 ^ 3 n _2 / 3 Lj)[ =1 
converges in distribution to (|7i|)^ =1 where |7,| is defined as above. 

The slightly strange scaling is chosen to match the graph case: the conclusion 
is that under these assumptions, up to a (k — l) 1 / 3 scaling factor, the large 
component sizes have the same limiting distribution as in the random graph 
G(n, (l + an-^/n). 

2 Proofs 

In this section we shall prove Theorems Q] and [5J The arguments, which closely 
follow those in require a little preparation. 

Let H — Hk(n,p) be the random k- uniform hypcrgraph defined in the intro- 
duction. Our proofs are based on an algorithm for 'exploring' the components 
of H in n steps. For ^ t ^ n, 'time t' refers to the situation after t steps, so 
step t goes from time t — 1 to time t. In step t we shall 'explore' a vertex vt, 
meaning that we reveal all edges incident with vt but not with any previously 
explored vertices. Noting that one vertex is explored in each step, this means 
that, however vt is chosen, each of the (£Zi) possible edges containing vt and 
not containing any previously explored vertices will be present with probability 
p, independently of the others and of the history. 

More precisely, as in [3], at time t every vertex is either 'explored', 'active' 
or 'unseen'. We write A t and Ut for the numbers of active and unseen vertices; 
exactly t vertices will be explored by time t, so A t + Ut = n — t. At time 
t = 0, we have Aq — and Uo = n. Fix an order on the vertices. In step 
1 ^ t ^ n we choose vt to be the first active vertex (at time t — 1), if there are 
any; otherwise Vt is the first unseen vertex. In the latter case we say that we 
'start a new component' in step t. In step t we reveal all edges containing vt and 
not containing any explored vertex. Let rjt be the number of unseen vertices 
other than v t in such edges. These r\t vertices are now labelled active (at time 
t), and vt is labelled as explored. It is easy to check that the process reveals 
the components of H one-by-one, starting a new component in step t whenever 
A t _i = 0. Thus, if = t < t\ < t 2 < ■ ■ ■ < t k = n enumerates {t : A t = 0}, 
then the sequence (i$ — lists exactly the numbers of vertices in the 

components of H , in some order. In particular, 

L\ = max{£j - ti-i : 1 i ^ k}. (4) 
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We shall study the random walk (X t ) defined by X t = A t — C t , where C t 
is the number of new components started within the first t steps. As in [4], we 
have ti = mi{t : X t — —i}. In step t, exactly rjt vertices change state from 
unseen to active. Furthermore, one vertex vt changes its state to explored: if a 
new component is started in step t, then Vt was previously unseen, otherwise it 
was active. Hence, if we start a new component, then A t — A t -i+r] t , otherwise 
A t = At-i + rjt — 1. Since Aq = Co = 0, it follows that 

t 

X t =A t -C t =Y^r H -l)- 

i=i 

Let (O, J 7 , P) be a probability space supporting our random hypergraph 

Hk(n,p). (For example, take to be the set of all 2( fc ) fc-uniform hypergraphs 
on [n], J- to be the power-set of fl, and P to be the appropriate probability 
measure.) Let TtQJ~ denote the sub-sigma-field generated by all information 
revealed by time t. Following the strategy of [I], the key task is to understand 
the distribution of r)t+i given J- t . Crucially, it is only the expectation that we 
need to bound precisely; our bound on the variance can be much cruder. 

Given F t , we know which vertex Vt+i we are about to explore in step t + 1. 
At time t there are U[ = U t — l{A t =o} = U t — {C t +i — C t ) unseen vertices u 
other than Vt+i- (The vertex Vt+i is active if A t > 0; otherwise it is unseen.) 
For each such unseen vertex u there are exactly 

(At +U t -2\ (n-t-2\ 
Ct+l= { k-2 ) = { k-2 ) (5) 

potential edges containing v t +i and u but not containing any of the t vertices 
previously explored. Since each such edge is present with probability p, the 
probability that u becomes active during step t + 1 is 

TTi = 1 - (1 - P ) Ct+1 = pct+i + 0(p 2 c 2 t+1 ) = X(n - t) k - 2 n- k+1 + 0(l/n 2 ). 

Of course, these events are not independent for different u, but this does not 
matter for the expectation. In particular, 

E(?7 t+1 | Ft) = Ufa = U' t pc t+1 + 0(l/n). (6) 

Fortunately for the subsequent analysis, this expression depends on U[ in a 
linear way. 

We next estimate the conditional variance of r)t+i given J-" t ; here we do not 
need to be so accurate. Let u± and U2 be distinct unseen vertices (other than 
v t +\ if that happens to be unseen). The probability that u\ and ui both become 
active is tT2 + tt3, where n-i is the probability that we find an edge containing 
Ut+i, u\ and U2, and -k^ is the probability that this does not happen, but we 
find disjoint edges activating u\ and u-i- Supposing that n — t — > oo, then 

7r 2 = 1 - (1 - p)("-3 3 ) „J n " * " 3 \ „ X (k - 2)(n - t) k - 3 n- k+1 . 
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Also, it is easy to check that 7r3 ~ tt 2 . Hence, 

Var( %+1 | F t ) = U[(U' t - 1)(tt 2 + tt 3 ) + Ufa - (Ufa) 2 
~ (Ul)fa + Ufa 

~ \(k~2){l-t/n) k - 3 ^r+\(l-t/n) k - 2 ^. (7) 

n z n 

In particular, the maximum possible value satisfies 

maxsup Var(?7 t+ i | F t ) ^ X(k - 1) = 0(1). (8) 
* n 

Let D t = E(r) t -1 | J"*_i). Recalling that U t = n-t-A t = n-t-(X t + C t ), 
and noting that U[ = U t — [Gt+i — C{) = n — t — X t — Ct+i, from ^ we see 
that 

A+i = pU' t c t+ i - 1 + 0(l/n) = a t +i{n -t-X t - Ct+i) - 1 + 0(l/n), (9) 
where 

/n - 1 - 1 
a t =^=^ fc _ 2 

Set A t+ i = X t +\— X t — D t +i, so A t+i is .Ft+i-measurable and E(A t+ i | Ft) = Q, 
by the definition of D t +\. Note that 

-Xt+i = X t + Dt+i + A t+ i 

= (1 - a t+ i)X t + a t+1 {n - t) - 1 + A t+1 - a t+1 C t +i + 0{l/nJlQ) 

We shall approximate (X t ) by the sum of a deterministic sequence and a 
martingale. To this end, define a deterministic sequence (xt) by ieo = and 

Xt+i = (1 - at+i)#t + a t +i(n - 1) - 1. (11) 
Subtracting (flT) from (JTUJ) we see that 

-Xt+i - x t+ i = (1 - a t +i)(Jff - x t ) + Af + i - a t+ iC t+ i + E t+ i, (12) 
where -Et+i is an 'error term' with E t +\ = 0{l/n). Defining 

t 

A=n(i-oi), 

i=l 

the recurrence relation ()12j) may be easily solved to give 

* B 

X t -x t = J2j.{ A i- a i C i + E i)- ( 13 ) 
i=l ^ l 

Note that < a; < 1 for each z, so the sequence B t is decreasing. 
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Motivated by this formula, let 

t 

S t =Y,Pi X ^ (14) 

so (S t ) is a martingale with respect to (J 7 *), and set 

X t =xt+0tSt ■ 
this is our desired approximation for (X t ). 
Lemma 3. For any p = p(n) = 0(n~ k+1 ) we have 

\X t -X t \=0(tC t /n), 

uniformly in 1 t ^ n. 

Proof. From (fT3|) and the definition of X t we have 

* B 

Xt-X t =Y J ^{E*~a l C l ). 

i=l Pl 

The result follows from the fact that B t /Bi — II* = i+i(l — a j) lB between and 
1, the bounds E i: cti = 0(l/n), and the fact that C, ^ C t for i < t. □ 

We next analyze the deterministic trajectory (x t ). Setting x t = n — t — y t , 
the relation (jlll) can be rearranged to give yt+i = (1 — a t +i)yt- Since yo = 
n — xq — n, we see that yt = nBt, so 

Xt = n — t — nBt- (15) 

Recall that the cti are 0(l/n). Since there are at most n terms in the sum, 
it follows that 

t t 
log & = lQ g(l - «*) = - 1] «* + 0(1/™)- 

i=l i=l 

From the definition of <x; = pci we have 
t * 



1=1 

It follows that 



n — i — 1 \ / rn — 1\ (n — t — 1 



log A = 



fc-2 / p v Vfc-iy v fc-i 



^ n *± (1 - (1 -*/„)*-!)+ 0(l/n) 



A_( 1 _(i_ t/n )*-i) +0( i/ n ). (i 6) 
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Define the function g — gk,\ by 

9k .Ar) = 1 - r - exp (-^(l - (1 - r)^ 1 )) (17) 
and (for compatibility with the notation in [3]) set 

/(*) = /n,fc,A(t) - »»$(*/»)• 

Then (JTSJ) and (J6J| imply that that x t = /(i) + 0(1), uniformly in < f s$ n. In 
other words, the function / or g represents a (rescaled in the case of g) idealized 
form of the deterministic approximation (xt) to (X t ). 

From (QJ and ([2j it is easy to check that p = pu.x satishes g{p) = 0. Note 
that 

g'(r) = -1 + A(l - r) fe - 2 exp (l - (1 - r)*" 1 )) , 

so g'(0) = A - 1. Also, recalling that (1 - p) k ^ 1 = 1 - p\ = A* /A, 

ff '(p) = -1 + A(l - p)^ 1 = -(1 - A,). (18) 

Furthermore, 

g"(r) = (-A(t - 2)(1 - t)*- 3 - (A(l - t)'- 2 ) 3 ) 

-»(-I=l( 1 -f- T >"))- 

so 5" ^ 0. Hence g is concave, so / is concave. Also, sup{|g"(r)| : ^ t ^ 1} = 
O(l), so /"(*) = 0(l/n), uniformly in < t < n. 
Note that 

g"(0) = -(X(k - 2) + A 2 ) - —(k - 1) + 0(e), 

where e = A — 1. Since (as is easily checked) g'" is uniformly bounded, it follows 
that for r = 0(e) we have 

g(r) = g(0) + rg'(0) + r 2 .g"(0)/2 + 0(r 3 ) = er - (k - l)r 2 /2 + 0(e 3 ). (19) 

With this preparation behind us, the proof of Theorem Q] will follow that in [4 
very closely, so we give only an outline. We use the same notation as in [4 
whenever possible. 

Proof of Theorem^ Firstly, we have log ft = 0(1) uniformly in ^ t ^ n. 
This and (|5|) imply that the increments (3^ 1 Ai of the martingale (St) have 
variance O(l), so Var(St) = 0(t) for any deterministic t = t(n). Hence, by 
Doob's maximal inequality, max^j \St\ = O p (i/i)- In particular, 

Xt = xt + frSt = f(t) + PtSt + 0(1) = f(t) + O p (Vt). (20) 
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Let <7o = y/en, let u = oj(n) tend to infinity slowly (with ui 6 = o(e 3 n)), and 
let to = ujao/e. Write Z = — inf {Xt : t < t } for the number of components 
completely explored by time to, and To for the time at which we finish exploring 
the last such component. Since /'(0) = g'(0) = e, and we have the bound 
Xt — Xt = 0(tCt/n) from Lemma El the proof of [H Lemma 6] goes through 
mutatis mutandis to show that Z ^ (Jq/uj and To <7o/(ew). Note for later that 
the latter bound gives To = o p (y / n/e). 

Writing t\ = pk,\n (ignoring the irrelevant rounding to integers), let T\ be 
the first time at which we finish exploring a component after time to- Arguing 
exactly as in [4] we see that t\ — to ^ T\ ^ t\ + to holds whp, and indeed that 

T 1 =t 1 +X t J{l-K)+o v {ao/e). (21) 

(Relation ([2~T]) takes the place of equation (21) in [4].) 

We claim that for each t ^ ti we have Ut — ut + o p (n), where now 

u t = nexp f-_A_(l - (1 - t/n)*- 1 )) . (22) 

Noting that u t = n—t—f(t), and recalling that Ut = n—t—A t = n — t — (X t +Ct), 
this can be deduced from the crude bound (|20[) on X t and Lemma [3] by arguing 
as in pi]. Note that, as pointed out to us (in the graph case) by Lutz Warnke, 
the approximate form of this formula is easy to guess: ignoring vertices selected 
to start new components, a given vertex u is unseen at time t if and only if none 
of the potential edges containing u tested during the first t steps was found to 
be present. There are (tZi) — ("fc-Y 1 ) sucn edges: those containing u and at 
least one of the t explored vertices. 

From ([7]) and (f2"2"]) . the sum of the conditional variances of the first t\ incre- 
ments of (St) is concentrated around 

ti 

Y^Pi 2 ( A (fc - 2)(1 - i/n) k -\ Ui /n) 2 + A(l - i/nf^Ui/n) . (23) 

i=l 

Although the distribution of the increments is not quite as nice as in the graph 
case, it is easy to see that the conditional distribution of % given Tt-\ is domi- 
nated by k— 1 times a binomial random variable with mean 0(1). (The binomial 
random variable is the number of edges found; each contributes a number of new 
unseen vertices between and k — 1.) Thus any fixed moment of rjt is bounded 
by a constant, and this transfers to Dt = rjt — E(r/t | J~t-i) and hence to (3~ t Dt. 
This condition (for the fourth moment) and concentration of the sum of the 
conditional variances is more than enough for a martingale central limit theo- 
rem such as Brown [5J Theorem 2], and it follows that St lt and hence X tl and 
thus Ti, is asymptotically normally distributed. 

For the variance, the sum in ([23| is well approximated by an integral, and 
after a slightly unpleasant calculation one sees that 

Var(A\) = VarGSiSt) - (A(l - pf - A*(l - p) + p(l - p))n = (1 - A») V, 
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where p — pk,x and a = ak,\ are denned in (0) and (J3|). Recalling (j2"Tj) . and 
noting that T = o v (y/n/e) = o p (cr), it follows that T± and thus T\ — T is 
asymptotically normal with mean pn and variance a 2 , so there is a component 
whose size has the required distribution. 

Finally, it is not hard to check that there is whp no larger component, using 
the fact that what remains to be explored after time T\ is a subcritical random 
hypergraph. Specifically, one can either apply a martingale argument as in 
Nachmias and Peres [T2], or simply apply the subcritical case of the results 
proved by Karohski and Luczak [5]. □ 

Proof of Theorem^ Suppose that p = X(k — 2)!n _fc+1 with A = 1 + e, where 
£ = e(n) satisfies en 1 / 3 — > (fc — l) 2 / 3 a as n — > oo, for some a£l constant. As 
before we consider the random walk (X t ), but now only for t ^ An 2 / 3 for a large 
constant A. Aldous [1! shows in the graph case that, appropriately rescaled, the 
process (Xt) converges to a deterministic quadratic plus a standard Brownian 
motion. We show the same here, with slightly different rescaling. 

The argument giving xt = ng(t/n)+0(l) above assumed only that A = O(l), 
which applies here. Writing t — s(k — l) _1 / 3 n 2 / 3 , using (fT9|) we see that for 
t ^ An 2 / 3, we have 

x t /((k - l)nf/ 3 = (k— ly^n^g^k - l)- 1 /^" 1 / 3 ) + o(l) 

= n 1/3 e(k - l)-' 2/3 s - s 2 /2 + o(l) = as - s 2 /2 + o(l). 

In other words, the deterministic limiting trajectory is quadratic. Moreover, in 
this range (indeed, whenever t = o{n)) we have (3 t ~ 1 (see (fl6|)) and, from 0, 
the martingale differences appearing in ()14[) have (conditional) variance 

/3r 2 Var(A, ; | F^) ~ Var(A, | Ti- X ) = Varfo | ~ fc — 1. 

It follows using the bounds on |JCt — a;* — iSt| established above that the rescaled 
process whose value at rescaled time s is ^" s (fc_i)-i/3 n 2/3 / ((k-l)n) 1 ^ 3 converges 
to the quadratic function above plus a standard Brownian motion, i.e., to W a (s). 
The rest of the argument is exactly as in the original paper of Aldous pQ , so we 
omit the details, noting only that it is the time rescaling factor that appears in 
the component sizes. □ 

For slightly more details of a related argument, see [HI Section 4]. 
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in Atlanta. We are grateful to him for raising this question. 
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